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Abstract 

This is a proposal to create a research facility for the development of 
a high-parallel version of the SP machine, based on the SP theory of in¬ 
telligence. We envisage that the new version of the SP machine will be an 
open-source software virtual machine, derived from the existing SP computer 
model, and hosted on an existing high-performance computer. It is intended 
as a means for researchers everywhere to explore what can be done with the 
system and to create new versions of it. The SP system is a unique attempt 
to simplify and integrate observations and concepts across artificial intel¬ 
ligence, mainstream computing, mathematics, and human perception and 
cognition, with information compression via the matching and unification 
of patterns as a unifying theme. Many potential benefits and applications 
flow from this simplification and integration. These include potential with 
problems associated with big data; potential to facilitate the development 
of autonomous robots; and potential in other areas including: unsupervised 
learning, natural language processing, several kinds of reasoning, fuzzy pat¬ 
tern recognition and recognition at multiple levels of abstraction, computer 
vision, best-match and semantic forms of information retrieval, software en¬ 
gineering, medical diagnosis, simplification of computing systems, and the 
seamless integration of diverse kinds of knowledge and diverse aspects of in¬ 
telligence. Additional motivations for creating the proposed facility include 
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the potential of the SP system to help solve problems in defence, security, 
and the detection and prevention of crime; the potential impact of the devel¬ 
opment in terms of economic, social, environmental, and academic criteria, 
and in terms of publicity; and the potential for international influence in 
research. The main elements of the proposed facility are described, includ¬ 
ing support for the development of SP-neural, a neural version of the SP 
machine. The facility should be permanent in the sense that it should be 
available for the foreseeable future, and it should be designed to facilitate 
its use by researchers anywhere in the world. 


1 Introduction 

“There is nothing more practical than a good theory”, Kurt Lewin [7, 
p. 169], 

This is a proposal to create a high-parallel version of the SP machine, which we 
will refer to as “SPM”, and a research facility for its further development, which we 
will refer to as “SPF”. It is envisaged that SPM will be created as an open-source 
software virtual machine, derived from the existing SP computer model (SP71), 
and hosted on an existing high-performance computer. SPF is intended to be a 
means for researchers everywhere to explore what can be done with the SP system 
and to create new versions of it. 

1.1 The SP theory and the SP machine 

The basis of the proposed development is the SP theory of intelligence. This 
is a unique attempt to simplify and integrate observations and concepts across 
artificial intelligence, mainstream computing, mathematics, and human perception 
and cognition, with information compression via the matching and unification of 
patterns as a unifying theme]]] 

Many potential benefits and applications flow from this simplification and in¬ 
tegration, as detailed in Section [2] and Appendix [C] Several of those benefits and 
applications may be realised on relatively short timescales, as indicated in the 
introduction to Appendix [C] 

The SP theory was not dreamed up overnight. It is the product of about 20 
years of development and testing. 

Mhe name “SP” is short for Simplicity and Power , because compression of any given body 
of information, I, may be seen as a process of reducing “redundancy” of information in I and 
thus increasing its “simplicity”, whilst retaining as much as possible of its non-redunclant 
descriptive and explanatory “power”. 
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A central idea in the SP theory is the powerful concept of multiple alignment , 
borrowed and adapted from that concept in bioinformatics, and outlined in Ap¬ 
pendix [Alh2 It appears that multiple alignment provides a key to versatility and 


adaptability in intelligent systems. 

Expressing the theory in a computer model has helped to reduce vagueness in 
the theory, it has provided a means of testing candidate ideas—many of which 
have been rejected as a result of this testing—and it is a means of demonstrating 
what can be done with the system. 

There is an outline of the SP theory and the SP computer model in Appendix 
[A] with pointers to where fuller information may be found. Distinctive features 
and apparent advantages of the SP system are summarised in Appendix [B] Some of 
the potential benefits and applications of the SP system are summarised in Section 
[2] and Appendix [Cj with pointers to more comprehensive sources of information. 

The wide scope of the SP system, and the wide range of its potential benefits 
and applications, means that there are many more avenues to be explored than 
could be tackled by any one research group. Hence the need for SPF. It is intended 
to facilitate large-scale collaboration, very much in the spirit of: 


“The creation of this new era of [cognitive] computing is a monumental 
endeavor ... no company can take on this challenge alone. So we look to 
our clients, university researchers, government policy makers, industry 
partners, and entrepreneurs—indeed the entire tech industry—to take 
this journey with us.” [6j Preface]. 


“... fund activities that support integration and collaboration within 
the research community; for instance, CPPs [Collaborative Computa¬ 
tional Projects], consortia, networks, international interactions.”^] 


1.2 Presentation 

In what follows, we’ll first describe some motivations for creating SPF (Section [ 2 ]). 
Then we’ll describe the facility itself and how it may be used (Section [3]). There 
is an estimate of costs in Section [4] and concluding remarks in Section [5] 


2 Motivations 

In terms of policies of the UK Government and the UK Research Councils, SPM 
and SPF tick several boxes, and there are other reasons for creating SPF and 
developing SPM. These motivations are described in subsections below. 

2 From p. 9 in EPSRC E-Infrastructure Roadmap , August 2014, bit.ly/ld8Eh3I 
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2.1 The power and potential of multiple alignment 

Probably the most distinctive feature of the SP system is the powerful concept of 
multiple alignment, as it has been adapted from bioinformatics (Appendix [B]). In 
broad terms, this concept provides a key to: 


• Simplification of concepts and systems. Diverse concepts in computing and 
cognition may be assimilated to the relatively simple concept of multiple 
alignment (|19j. m Section2 and 4]), and that concept can mean substantial 
simplifications of computing systems (|19. Section 4.4.2], .20] Section 5]). 

• Versatility in intelligence. The SP system has strengths in unsupervised 
learning, natural language processing, pattern recognition, information re¬ 
trieval, several kinds of reasoning, planning, problem solving, information 
compression, and more [191121]. 

• Adaptability in intelligence. The SP system’s strengths in unsupervised 
learning m Chapter 9] may provide a foundation for other kinds of learn¬ 
ing [23 . Section V], Via unsupervised learning, and perhaps via other kinds 
of learning, the system has potential to promote human-like adaptability in 
intelligence. 

• Seamless integration of structures and functions. The SP system, with mul¬ 
tiple alignment centre stage, has potential to promote seamless integration 
of diverse kinds of knowledge and diverse aspects of intelligence [26] Section 
7], an integration that appears to be essential for effective reasoning and 
problem solving in intelligent systems. 


More specifically, there are many potential benefits and applications that flow from 
the multiple alignment concept, described in outline in Appendix [C] and in more 
detail in [T91 |2Sl [2U E31 EDI US]- As noted in Section 2.6 the potential value, 


worldwide, of those benefits and application, has been estimated to be at least 
$190 billion, every year. 

The estimated cost of setting up SPF is £634,700 and recurrent costs are 
estimated to be £173,800 pa (Section |4j). 

Taking account of these estimated costs, and the potential benefits and appli¬ 
cations, it appears that the proposed facility would be a good investment. 
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2.2 UK industrial strategy 


The development of SPF would chime well with the goal of “supporting emerging 
technologies”, one of five main themes in the UK’s industrial strategy]^] 

More specifically, the SPF fits well with the Government’s “information econ¬ 
omy strategy” Q and its focus on “eight ‘great technologies’ which will propel the 
UK to future growth” 0 

The SP system has considerable potential to contribute to development of two 
of those “great technologies”— big data (discussed in the next subsection) and 
robotics and autonomous systems (discussed in Section 2.4). The SP system is 
also relevant to development in a third area— energy —discussed briefly in Section 


2.5) 


In addition, the very broad scope of the SP system (Section 2.1), and the 
provision of a facility for researchers anywhere in the world (Section 3.4), is likely, 
via synergies of various kinds, encourage “better science”, “new kinds of science”, 
and “interdisciplinary projects”, in keeping with the Government’s A Strategic 
Vision for UK e-Infrastructure j^] 


2.3 Big data 

Outlined here are some significant challenges associated with big data and how 
the SP system may help to overcome them. 

2.3.1 Challenges 

There is a tendency amongst some people to write or speak as if big data was an 
unalloyed blessing and that all the necessary technologies are available to exploit 
it. But it is clear from the book Smart Machines by John Kelly and Steve Hamm 
(both of IBM) [g], and also from Frontiers in Massive Data Analysis from the US 
National Research Council [9], that much of the potential value of big data cannot 
be realised currently because of significant unsolved problems in that area. 

It is also clear from Smart Machines that significant problems associated with 
big data cannot be solved by tinkering with existing designs for computers. Radical 

3 See Industrial strategy: government and industry in partnership, GOV. UK, 2013-08-06, 
bit.ly/lkYdtBS. 

“*See Information Economy Strategy (PDF), June 2013, bit.ly/ld5tn8h 
5 See £600 million investment in the eight great technologies, Department for Business, 
Innovation & Skills, 2013-01-24, bit.ly/QFwhwi; and “The ‘eight great technologies’ which will 
propel the UK to future growth receive a funding boost”, speech given by the Rt Hon David 
Willetts MP, former Minister for Universities and Science, GOV.UK, 2013-01-23, 
bit.ly/lwKCqll 

°Report from the Department for Business, Innovation and Skills, January 2012, 
bit.ly/lFrAQzW, p. 21. 


5 










rethinking of the architecture of computers will be needed, probably drawing on 
what we can learn about the workings of the human brain. 

And achieving this transition to “cognitive computing” will require large-scale 


collaboration, as indicated in Section 1.1 


2.3.2 Potential solutions 


SPM and SPF are relevant to the issues just described because: the SP system has 
been developed in the spirit of cognitive computing, drawing heavily on research 
on human perception and cognition, and neuroscience; it offers radical solutions 
to problems posed by big data; and the existence of SPF would encourage and 
facilitate large-scale collaboration amongst researchers all around the world. 

The paper Big data and the SP theory of intelligence [23] describes how the SP 
system may help to solve nine of the problems associated with big data, as outlined 
in Appendix C.13.1 Three of those problems, and their potential solutions, are 


highlighted here: 


• The problem of variety in big data. There is a pressing need in computing to 
tame the great variety of formalisms and formats for the representation of 
knowledge, each with their own mode of processing. The multiple alignment 
framework in the SP system has clear potential as a universal framework for 
the representation and processing of diverse kinds of knowledge (UFK) [231 
Section III], 


• Energy consumption. The SP system, in conjunction with the concept of 
“data-centric computing” [6l Chapter 5], has potential to “make computers 
many orders of magnitude more energy efficient” [6j p. 88] via two key princi¬ 
ples [231 Section IX]: 1) Taking advantage of statistical information that the 
system gathers as a by-product of how it works; 2) Cutting out much search¬ 
ing by making direct connections between “neural symbols” in “SP-neural” 
(Appendix |A.6[ ). 


• Transmission of information. The SP system has potential to yield dra¬ 
matic economics in the transmission of information, partly by making big 
data smaller m Section VII], but perhaps more importantly via analy¬ 
sis/synthesis [111 Chapter 18], based on the judicious separation of encoding 
and grammar [23, Section VIII]. 


As noted in Appendix C.13.1[ it appears that, considering the nine proposed 
solutions collectively, and in several cases individually, there are no alternatives 
that can rival the potential of what is described in [ 23] . 

With regard to the Government’s information economy strategy^ “The UK 

7 Information Economy Strategy (PDF), June 2013, bit.ly/ld5tn8h. 
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now has the opportunity to take a lead in the global efforts to deal with the 
volume, velocity and variety of data created each day. This will require continued 
infrastructure investment ...” (p. 7). SPF has potential to be a major plank in 
that infrastructure, and at modest cost (Section [4]). 

2.4 Robotics and autonomous systems 

As mentioned above, robotics and autonomous systems is another of the Govern¬ 
ment’s “eight great technologies” where the SP system may make a contribution— 
perhaps via the UK Robotics and Autonomous Systems Network (UK-RAS Net¬ 
work)^] or otherwise—and the potential is considerable. 

In the paper Autonomous robots and the SP theory of intelligence [23], robots 
are “autonomous” if they do not depend on external intelligence or power supplies, 
and are mobile. It is also assumed that they are designed to exhibit as much 
human-like intelligence as possible. The paper describes how the SP system may 
contribute in three main areas: 

• Computational efficiency, the use of energy, and the size and weight of com¬ 
puters. “To field a conventional computer with [human-like] cognitive capac¬ 
ity would require gigawatts of electricity and a machine the size of a football 
field.” [6[ p. 75]. If a robot is to be autonomous as described above, it needs 
a ‘brain’ that is efficient enough to do all the necessary processing with¬ 
out external assistance, does not require an industrial-scale power station to 
meet its energy demands, and is small enough and light enough to be car¬ 
ried around—things that are difficult or impossible to achieve with current 
technologies. 

Section III of [23] describes how these things may be achieved in a revised 
and updated version of arguments in [23] Section IX]. The SP system may 
help: by reducing the size of data to be processed; by exploiting statistical 
information that the system gathers as an integral part of how it works; and 
via a revised version of Donald Hebb’s [5] concept of a “cell assembly”. 

• Towards human-like versatility in intelligence. If a robot is to operate suc¬ 
cessfully in an environment where people cannot help, or where such op¬ 
portunities are limited, it needs as much as possible of the versatility in 
intelligence that people may otherwise provide. 

The SP system demonstrates versatility via its strengths in areas such as un¬ 
supervised learning, natural language processing, fuzzy pattern recognition 

8 See, for example, “Widespread backing for UK robotics network”, The Engineer, 
2015-06-24, bit.ly/lBR9qE2. 
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and recognition at multiple levels of abstraction, best-match and semantic 
forms of information retrieval, several kinds of reasoning, planning, problem 
solving, information compression, and more [23] Section IV]. 

But the SP system is not simply a kludge of different AI functions. Owing 
to its focus on simplification and integration of concepts in computing and 
cognition (Appendix [A]), it promises to reduce or eliminate unnecessary com¬ 
plexity and to avoid awkward incompatibilities between poorly-integrated 
subsystems, as indicated in Appendix |C.l 

• Towards human-like adaptability in intelligence. The SP system’s strengths 
in unsupervised learning and other aspects of intelligence may help to achieve 
human-like adaptability in intelligence via: one-trial learning; the learning 
of natural language; learning to see; building 3D models of objects and of 
a robot’s surroundings; learning regularities in the workings of a robot and 
in the robot’s environment; exploration and play; learning major skills; and 
learning via demonstration [23] Section V], 

This new approach to the development of autonomous robots has several dis¬ 
tinctive features (Appendix |B|, especially the concept of multiple alignment. It 
appears that this approach has considerable potential to promote computational 
and energy efficiency in a robot’s brain, and to achieve human-like versatility and 
adaptability in intelligence. 


2.5 Energy 

The SP system may also contribute indirectly to another of the areas identified 
as one of the “eight great technologies”— energy —via increases in efficiency in the 


workings of computers and in the transmission of data, as outlined in Sections 2.3 
and 2.4, and described more fully in [241 Sections VIII and IX] and also in 


Section III], 


2.6 Other potential benefits and applications of the SP 
system 

In addition to big data (Section |2.3[ ), robotics and autonomous systems (Section 
2.4), and energy (Section [2.5[ ), there are several other potential benefits and appli¬ 
cations of the SP system, outlined in Appendix [C] As noted there, several of these 
potential benefits and applications may be realised on relatively short timescales 
with existing high-performance computers or even ordinary computers. 












Peer-reviewed papers about potential benefits and applications that have not 
already been mentioned include [22] (application of the SP theory to the under¬ 
standing of natural vision and the development of computer vision), [18] (appli¬ 
cation of the SP system to medical diagnosis), and [20j (the SP system as an 
intelligent database), and [26] (several other potential benefits and applications 
including unsupervised learning, natural language processing, pattern recognition, 
software engineering, information compression, the semantic web, bioinfomatics, 
data fusion, simplification of computing systems, and the seamless integration of 
structures and functions in diverse kinds of knowledge). 

In view of the wide scope of the SP system, and evidence of its potential, it 
seems reasonable to estimate that it could add at least 5% to the value of IT 
investments, worldwide. Since these are about $3.8 trillion annually]^] the value of 
the SP concepts, every year, would be at least $190 billion! [26] Section 8]. 


2.7 Defence, security, and the detection and prevention of 
crime 

With regard to “Security in a changing world”, part of the “vision” [131 P- 13] of 
the UK’s Science & Technology Facilities Council (STFC), several of the strengths 
of the SP system that have been mentioned are relevant to defence, security, and 
the detection and prevention of crime. These include: 


Big data (Section |2.3[ ), and data fusion [23, Section 6.10.5]. 
Robotics and autonomous systems (Section |2.4[ ). 

Natural language processing [13] Chapter 5], [231 Section 6.2], 


Pattern recognition (see Appendix C.5 and [[19] Chapter 6]). 
The SP system may function as an intelligent database ESI- 
Computer vision 


• Several kinds of reasoning, including one-step ‘deductive’ reasoning, abduc- 
tive reasoning, reasoning with probabilistic networks and trees, reasoning 
with if-then rules, nonmonotonic reasoning, ‘explaining away’, and causal 
diagnosis PI Chapter 7], 

• Planning and problem solving [US Chapter 8]. 

9 See, for example, “Gartner: Big data will help drive IT spending to $3.8 trillion in 2014”, 
Info World, 2013-01-03, bit.ly/ZOOSBr. 
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• Unsupervised learning [SI Chapter 9], [2B, Section 6.1]. 

• The potential of the system to simplify computing system [26[ Sections 4 
and 5]. 

• The potential of the system to promote seamless integration of diverse forms 
of knowledge and diverse aspects of intelligence [26, Section 7], 

This last point is particularly relevant to forensic work and security work, 
because of the need to remove artificial barriers between different aspects of intel¬ 
ligence, different kinds of knowledge, and different kinds of processing, so that all 
kinds of intelligence and knowledge can be brought to bear in solving crimes and in 
the prevention of acts of terrorism. Similar things may be said about autonomous 
robots that aspire to human-like versatility and adaptability in intelligence. 


2.8 Contributing to the work of the planned Alan Turing 
Institute for Data Science and the planned Cognitive 
Computing Research Centre 


“The Alan Turing Institute for Data Science will benefit from a <£42 
million government investment over 5 years that will strengthen the 
UK’s aims to be a world leader in the analysis and application of big 
data. It will also ensure that the UK is at the forefront of data-science 
in a rapidly moving, globally competitive area, enabling first-class re¬ 
search in an environment that brings together theory and practical 
application.’ 
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The SP system and SPF have potential to contribute to the work of the planned 
Alan Turing Institute for Data Science, and also the planned Cognitive Computing 
Research Centre by facilitating both the analysis and application of big data 
(Section |2.3[ ). 


2.9 Impact 

“The Higher Education Funding Council for England (HEFCE), Re¬ 
search Councils UK (RCUK) and Universities UK (UUK) have a shared 
commitment to support and promote a dynamic and internationally 

10 See “Plans for world class research centre in the UK”, GOV.UK, 2014-03-18, 
bit. \yj 1 j 6 C Z YB. 

11 See “Autumn Statement 2014”, H M Treasury, December 2014, p. 50, PDF, 
bit.ly/15006QS 
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competitive research and innovation base that makes an increased and 
sustainable contribution, both nationally and globally, to economic 
growth, wellbeing, and the expansion and dissemination of knowl¬ 
edge.”, “Impact Policies”, Research Councils UK, bit.ly/lznJpts, 

On five main fronts, the potential impact of SPF is large: 


Economic. As noted in Section 2.6, a conservative estimate of the potential 
value of the SP concepts is at least $190 billion, every year. 


Social. As already noted, there are many potential benefits and applications 
of the SP system, some of them outlined in Sections 2.3 to 2.6 and in 
Appendix |Cj 


Environmental. The SP concepts have potential to deliver more for less— 
with consequent environmental benefits—via a combination of: 1) gains in 
economic and social impact (as just noted); 2) reductions in energy con¬ 
sumption and in the size and weight of computers (Sections 2.3.2 and 2.4)J^| 


• Academic. We expect the establishment of SPF, with associated publicity, 
to encourage and facilitate contributions by research groups and individual 
researchers all around the world. Given the wide scope of the SP system 
and the wide range of its potential benefits and applications, the academic 
impact of SPF is potentially very large. Research that is done with SPF may 
be seen as a ‘dividend’ from the investment. From the perspective of the UK, 
research that is done by research groups and individual researchers outside 
the UK may be seen as a bonus, additional to what would be achieved if the 
facility were restricted to UK researchers. In addition, there are likely to be 
benefits arising from synergies and large-scale collaboration (Section |2.2[ ). 

Initially, we aim to raise awareness of the facility amongst researchers and 
to encourage relevant research projects. But later, we expect that research 
with the facility will develop its own momentum. 


• Publicity. As a condition for using SPF, researchers would be asked to en¬ 
sure that every relevant publication in an academic journal, collection of 
academic papers, or conference proceedings, contains an acknowledgement 
of the facility and of the body or bodies that have provided the necessary 
funds. This in itself would be publicity. But, in addition, there is potential 

12 There would also be a need for measures to counteract the effects of “Jevon’s paradox” 

( Wikipedia , bit.ly/ldUJF9C, retrieved 2014-12-22)—how gains in efficiency may, without 
appropriate controls, lead to an overall increase in consumption. 
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for publicity in articles in newspapers and magazines, in programmes on ra¬ 
dio or TV, and on the internet, in blogs, mailing lists, Facebook pages, and 
the like. 

2.10 International influence 

“Our priorities for maximising international impact are: to seek in¬ 
creased influence for the UK in research, especially in Europe; in the 
long term to work towards the siting of a major international research 
facility in the UK.”, [13| p. 21], 

Given the wide scope of the SP system, the international dimension of SPF, 
and its potential impact (Section |2.9[ ), it has potential to be “a major international 
research facility” and to increase the international influence of STFC or other 
hosting organisation. 


3 The proposed facility 

As mentioned in the Introduction, it is intended that SPF will facilitate the de¬ 
velopment of a high-parallel version of the SP machine, realised as an open-source 
software virtual machine, and hosted on an existing high-performance computer. 
How things may develop is shown schematically in Figure [I] 

Aspects of the proposal are described in the following subsections. 


3.1 The proposed facility and public policies 


Some aspects of SPF may seem unfamiliar and may, superficially, appear to conflict 
with funding policies. These remarks apply, in particular, to the proposal that 
SPF should be permanent (Section 3.3), that it should be available to researchers 


anywhere in the world (Section 3.4), and that, with few if any exceptions, there 


should be no charges for using the facility (Section |3.5[ ). 

But it appears that, at high levels, it is recognised that funding for e- 
infrastructure should be responsive and flexible, that it should recognise the diverse 
needs of the research community, and that there is a need for long-term perspec¬ 
tives. In support of such policies: 


• “In the roadmap, EPSRC aims to ... understand the requirements of the 
EPS [engineering and physical sciences] research community that make use 
of e-infrastructure, ensuring there are no gaps or duplication.” (From p. 3 
of “EPSRC e-infrastructure roadmap”, Engineering and Physical Sciences 
Research Council, August 2014, bit.ly/lHvM34x). 
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Figure 1: Schematic representation of the development and application of the SP 
machine. 


• “The funding agencies should recognise the need for flexibility of funding and 
put in place mechanisms to enable rapid responses to hardware and software 
developments and to new research opportunities and requirements.” (Rec¬ 
ommendation ^3 from p. 17 in “Strategy for the UK Research Computing 
Ecosystem”, The e-Science Institute at The University of Edinburgh, Octo¬ 
ber 2011, bit.ly/lKslhrQ). 

3.2 Elements of SPF 

It is envisaged that the main features of SPF will be these: 


• The SP machine, version 1. SPF will contain the first high-parallel version of 
the SP machine, which may be referred to as the “SP machine, version 1” or 
“SPMl”j^] The name “SPM” would normally mean SPM1 but, depending 
on the context, it may be used to refer to any other version of the SP machine 
or to all versions, collectively. 


Founded on the SP71 computer model. The SP71 computer model (Appendix 


A.l), the basis for SPM, will be ported on to the host machine. This should 


13 Although the SP computer model may be seen as a version of the SP machine (Appendix 


A.l I, we think it is probably best if the first SPF version is called “SPM1”. 
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be relatively straightforward to do since the model is written in C++, with 
good comments, and since a slightly earlier version, SP70, is described quite 
fully, with pseudocode, in [T9| Sections 3.9, 3.10, and 9.2], 


High levels of parallel processing. As far as possible, processes in the SP71 
code will be modified to take advantage of parallel processing. This may 
be done in three main parts of the system, in procedures for: the matching 
and unification of patterns (as outlined in Appendix A.3.1); the building 
of multiple alignments (Appendix A.3.2); and the unsupervised learning of 
grammars (Appendix A.3.3). 


• Good user interface. SPF will be accessed via a well-designed website, with 
user-friendly interfaces for each part of the system, using multimedia where 
appropriate, ft would be best if users did not have to download and install 
any software. Any such downloading of software can probably be avoided by 
developing the user interface using the facilities of HTML5. In accordance 
with Google’s new search policies, announced in April 2015 J^] the website 
should be “mobile friendly”, meaning that it should be easy to use and legible 
on mobile phones. 


Setting up accounts. There will be procedures for the registration of each 
user, assigning a unique identifier and setting up facilities for each account, 
Here, a “user” will be a research group or an individual researcher. To avoid 
undue complications, at least initially, each research group will be treated as 
if it was an individual researcher, with no attempt to differentiate individual 
researchers within each group, or their work. Each research group would 
need to make its own arrangements for the division of labour within the 
group. 


It is likely that most of the necessary mechanisms are already established for 
registering users of the host machine. 


• Creating new versions of the SP machine. Any user may create one or more 
new versions of SPM: 


— Each such version will have its own storage space with documentation, 
source hies, executable hies, data hies, and so on. 


14 See, for example, “Google search changes will promote mobile-friendly sites”, BBC News, 
2015-04-20, bbc.in/lQ9AORA 

15 As noted in Section 2.9 each user would be asked to ensure that, in all academic 
publications arising from research with SPF, there would be an acknowledgement of the facility 
and of the body or bodies that have provided the necessary funds. 
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— There will an hierarchical scheme of the usual kind for the identifiers of 
different versions, with identiher(s) for the creator(s) of each version. 

— The documentation should make clear the origins of each version in one 
or more previous versions, which may include SPM1 or other versions 
by the same user or other users. 

— Any version of SPM will be in one of two states: “under development” 
or “completed”. Once a version has been “completed”, further mod¬ 
ifications will not be allowed. The executable code of any completed 
version may be run by any other user (next). 

• Running the SP machine. Any version of SPM that is under development 
may be run by the user that is developing it. Any completed version of SPM 
may be run by any user, which would of course include the user that created 
it. For any user to run any given version of SPM, facilities will be needed 
for: 


— Assigning an area of storage to hold data and results. 

— Setting parameters for the given version of SPM. 

— Creating and editing SP patterns. 

— Uploading SP patterns that have been created elsewhere. 

— Online debugging of software (for SPM versions that are under devel¬ 
opment). ft is likely that a suitable debugger is already established on 
the host machine. 

— Viewing results, including multiple alignments and grammars cre¬ 
ated by the program. Since some results—multiple alignments in 
particular—may be larger than can be seen comfortably on a typical 
computer screen, there should be facilities for zooming and scrolling. 

— Downloading SP patterns, included those newly-created by the given 
version of SPM. 

• SP-neural, version 1. Alongside SPM1 will be a first, tentative, version of 
the SP-neural machine, which may be referred to as “SPNRL1”. The main 
elements of the SP-neural facility will be: 

— As with SPM, each user may create one or more versions of SPNR.L. 

— SPNRL1 will have a means of translating a set of SP patterns (or one 
or more multiple alignments) into an inter-connected set of pattern 
assemblies. With the object-oriented facilities of C++, there will be a 
class for neurons and a class for pattern assemblies. 
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— In this preliminary model, each SP symbol will be represented by a 
single neuron although it is likely that in living brains, neural “symbols” 
are realised with clusters of inter-connected neurons. 

— Connections between neurons will be represented with C++ pointers. 
As SPNRL is developed, it is likely that there will be excitatory connec¬ 
tions and also inhibitory connections, in accordance with what is known 
about neural tissue. It is also likely that there will be lateral connections 
between neighbouring neurons within each pattern assembly. 

— There will be a means of displaying, in graphical form, a set of inter¬ 
connected neurons and pattern assemblies, much as in Figure [IJ 

— For any given set of inter-connected neurons and pattern assemblies, 
there will be a means of downloading it as a set of SP patterns. 

— As SPNRL is developed: 

* There will be a means of sending excitatory and inhibitory signals 
through inter-connected pattern assemblies and displaying the re¬ 
sults in terms of the excitatory levels of individual neurons. 

* There will a means of creating new pattern assemblies and learning 
neural ‘grammars’, in accordance with learning processes in any 
given version of SPM. 

* Parallel processing will be applied to mimic the high levels of par¬ 
allelism that exist in the workings of real neural tissue. 


3.3 The facility should be permanent 

For reasons that follow, SPF should be permanent in the sense that it should be 
available for the foreseeable future: 


Bridging the ‘Valley of Death’. In accordance with “It is our historic failure 
to back [R&D, technology and engineering] which lies behind the familiar 
problems of the so-called ‘valley of death’ between scientific discoveries and 
commercial applications”. 16 the facility should support and encourage re¬ 


search that has sufficient breadth and depth to bridge that ‘valley of death’. 
Short-termism is just as damaging for research as it is for business. 


• Long-term research programmes: 

16 Speech by the Rt Hon David Willetts MP, “The ‘eight great technologies’ which will 
propel the UK to future growth receive a funding boost”, 2013-01-24, bit.ly/lwKCqll 
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— Researchers need confidence that SPF will not be withdrawn. We envis¬ 
age the proposed facility providing the basis for research that is likely 
to include long-term research programmes. Any research group that 
wishes to embark on that kind of research programme needs to have 
confidence that the research facility will not be withdrawn. Without 
assurance for researchers that the facility is permanent, it is likely that 
the entire project will fail. 

— BIS: software as a sustained infrastructure in the long term. “Current 
investments in software development should be reviewed with a view to 
developing models of support for software as a sustained infrastructure 
in the long term, as opposed to being supported by significant one off 
investments. ”FH 

— e-Science Institute: the need for long-term funding for ajnbitious soft¬ 
ware development projects. “There needs to be long-term funding for 
ambitious software development projects .... The current approach is 
hardware centric and short-term ..., but does not recognise the criti¬ 
cal and long-term role that software and people play in the [research 
computing] enterprise.” 

• Time to develop research programmes. It is likely to take time for a research 
group to learn about the SP system, to decide to mount a research project, to 
prepare a research proposal, for the proposal to be assessed, to provide PCs 
and other facilities, and to recruit staff. If the proposed research facility is 
only available for a few years (bearing in mind the time needed to create the 
facility), much of that time may be taken up with the processes described. 
Then it is likely that there would be little or no time left to do the actual 
research. 
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• Consistency across facilities. In the same way that large and expensive 
facilities—large telescopes, atom smashers, supercomputers, and the like— 
are maintained for many years, and replaced as necessary, the same should 
apply to less expensive or smaller-scale facilities. In both cases, research 
benefits from the long-term perspective that stability makes possible. 


Archival status. As outlined in Section |3.2| , the facility will be designed so 
that research groups and individual researchers may create their own versions 
of the SP machine. Each such version needs to have archival status: 


17 From p. 8 in “Report of the e-Infrastructure Advisory Group”, Department for Business, 
Innovation and Skills, June 2011, bit.ly/lLMXOSD. 

18 Recommendations #4 and #5, p. 17, in “Strategy for the UK Research Computing 
Ecosystem”. The e-Science Institute at The University of Edinburgh, October 2011, 
bit.ly/lKslhrQ. 


17 






— The need for long-term access to and execution of software. Other re¬ 
searchers, now and in the future, need to be able to access the source 
code and documentation for any version of the SP machine and to run 
the executable code. Naturally, the host machine for SPF may need to 
be replaced from time to time. In any such case, SPF, including all asso¬ 
ciated projects and data, may be ported on to the replacement machine, 
in accordance with standard practice in industry and commerce. 


— RCUK: software as a vital research infrastructure. The need for long¬ 
term preservation of software and associated computing systems is im¬ 
plied by: “Software developed for experimental facilities and instrumen¬ 
tation, modelling and simulation and data-analysis is a critical and 
valuable resource. Software and algorithm development represents ma¬ 
jor investments by skilled researchers, and the large suite of codes and 
algorithms used in research should be regarded as a vital research in¬ 
frastructure, requiring support and maintenance along the innovation 
chain, and throughout its life cycle. The reproducibility of research is 
at the very heart of the scientific method. 
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— Long-term preservation of software. This requirement accords with the 
trend for academic journals to allow the author or authors of any pub¬ 
lished paper to provide supplementary material, which may include 
data, software, and things that may be regarded as forms of software, 
such as MP4 files J^J It is increasingly recognised that many bodies of 
software should have the same archival status as peer-reviewed papers 
in academic journals or conference proceedings, and bodies of data on 
which those papers are based. 


— Preserving the execution environment. Since software will often not 
run, or will not run correctly, unless it is running on the type of com¬ 
puter (with its operating system and related software) for which it was 
written, there is also a need for long-term preservation of each type of 
computer. There is more detail here: 


* Vint Cerf: digital “dark age” and “digital vellum”. The case has 
been argued by internet pioneer Vint Cerf, as reported in, for ex- 

19 From p. 8 of “E-infrastructure roadmap”, RCUK e-Infrastructure Group, 2014, 
bit.ly/lJx59EE 

zu Tlie journal Royal Society Open Science says “Supplementary material can be used for 
supporting data sets, supporting movies, figures and tables, and any other supporting 
material.” (“Instructions for Authors”, bit.ly/lwD8ezN); and IEEE Access allows authors to 
include “data collections and multimedia materials” (“IEEE Access- Information for Authors”, 
PDF, bit.ly/lwD5H8x). 
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ample: “Google’s Vint Cerf warns of ‘digital Dark Age’” 21 Vint 


Cert also makes the case in a lecture that he gave on 2015-02-11, 
“Digital vellum and the expansion of the internet into the solar 
system” (see bit.ly/laqDeKw); 

Olive Executable Archive. How software and associated computing 
systems may be preserved is demonstrated in the “Olive Executable 
Archive”, www.olivearchive.org. This project has been highlighted 
by Vint Cerf in his articles and lectures. 

New gold standard established for open and reproducible research. 
The case has also been argued by computer scientists at Cambridge 
University who “have set a new gold standard for openness and re¬ 
producibility in research by sharing the more than 200GB of data 
and 20,000 lines of code behind their latest results—an unprece¬ 
dented degree of openness in a peer-reviewed publication.”! 22 ] 
Maintaining access to digitised information. The need to maintain 
access to digitised information in the face of changes in the hard¬ 
ware or software environment that it needs is discussed in “Pre¬ 
serving the digital record of computing history” ^ 


As noted above, replacement of SPF’s host machine should not be a 
problem. The system may be ported onto any new machine, as or when 
that becomes necessary. 


• Avoiding waste and gaining full value from SPF. Given that the bulk of the 
cost of SPF is in its initial development and setting up, there is little scope 
for saving money by terminating the facility within a few years. Indeed, any 
such move is likely to mean a waste of resources and a failure to gain full 
value from the facility: 


— The damaging effect of short-termism. As noted above, it is likely that 
the entire project will fail unless researchers have confidence that the 
facility is permanent. 


— Loss of valuable research. Premature termination of SPF would mean 
losses of valuable research that may otherwise accrue, much of it pro¬ 
vided at no cost to the UK by researchers elsewhere (Sections 2.9 and 


3.5). 


21 BBC News, 2015-02-13, bbc.in/lD3pemp 

22 “New gold standard established for open and reproducible research”, Cambridge 
University press release, 2015-05-04, bit.ly/lIgFkKG 

23 Article by David Anderson in Communications of the ACM , vol. 58, no. 7, July 2015, 
pp. 29-31. 


19 








— SPF will not wear out and its value will increase. Unlike ordinary 
research facilities such as space telescopes or atom smashers, SPF will 
not wear out. And its value for researchers will increase progressively 
as new versions of SPM are created and made available for the whole 
research community, including all new insights and improvements. 


3.4 The facility should be available to researchers any¬ 
where in the world 

As previously noted, the wide scope of the SP system means that there are far 
more avenues to be explored than any one research group could tackle on its 
own. Accordingly, we are are aiming to create a facility for research groups and 
individual researchers anywhere in the world, ft should not be restricted to UK 
scientists: 


This accords with what John Kelly and Steve Hamm say about the need 
for large-scale collaboration (quoted in Section 1.1) and also with the need 
for integration of e-infrastructures “internationally, across other national e- 
infrastructures, to deliver end-to-end services in the global environment of 
collaborative research”!^] and the need to facilitate “... research collaboration 
between industry and academia.’ 
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It also accords with the principles that: 


— The EPSRC aims to “Understand the whole UK e-infrastructure land¬ 
scape, view it holistically and consider it within an international context 
(emphasis added). 


— “The UKs Research and Innovation e-infrastructure needs to be led and 
driven to deliver a UK wide vision for research e-infrastructure, embed¬ 
ded in the international context essential to todays research challenges .” 
(emphasis added) p] 


Opening up the research in this way is likely to yield synergies that might 
not arise if there were more restrictions (Section 2.2)—with corresponding 
benefits for UK researchers. 


24 p. 2 in E-infrastructure roadmap , Research Councils UK, bit.ly/lJx59EE 
25 ibid., p. 5. 

26 The EPSRC “E-Infrastructure Roadmap”, retrieved 2015-07-01, bit.ly/lR3V767 
2 ' “Report of the e-Infrastructure Advisory Group”, Department of Business, Innovation and 
Skills, June 2011, p. 3, bit.ly/lLMXOSD 
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The international perspective will help to achieve full value from the facility 
(Section 2.1). 


And it will help to increase the impact of the facility (Section 2.9) and its 
international influence (Section |2.10[). 


3.5 The facility should be easy to access and easy to use 

In keeping with the main points in the preceding subsection, it is important to min¬ 
imise obstacles to the use of the system for preliminary investigations, for teaching 
students, for individual research projects, and for long-term research programmes. 
An additional reason is that, for any research group or individual researcher that is 
not already familiar with the SP concepts, a significant amount of time and effort 
is likely to be required to get up to speed, so it is important to minimise other 
obstacles to progress. 

For these reasons: 

• Ease of access. The process of registering with the system and gaining access 
to it should be as simple and straightforward as possible. 

• User friendly. For all aspects and parts of the system, there should be a 
user-friendly interface. 

• Free at the point of use. With few if any exceptions, the system should be 
free for users. In addition to minimising obstacles for researchers, reasons 
include: 


Research dividends. In keeping with remarks about research ‘dividend’ 
in Section 2.9, the overall value of research done with SPF is likely to 


outweigh the cost of providing and maintaining the facility, and by a 
considerable margin. 

Research bonuses. As noted in Section T9, research that is done by 
research groups and individual researchers outside the UK may be seen 
as a bonus, additional to what would be achieved if the facility were 
restricted to UK researchers. 


Research synergies. Also, as noted in Section 2.2, there are likely to be 
benefits from synergies and large-scale collaboration. 


The ‘Valley of Death’. In accordance with remarks in Section 3.3 about 
the ‘valley of death’, it is important to avoid any premature requirement 
that the research should, commercially, “stand on its own two feet” or 
“become self-sustaining”, in much the same way that we do not expect 
young children to go out and earn a living. 
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With regard to costs, these may be met as follows: 

• Development and maintenance costs. These costs may be met from a variety 
of sources, perhaps including funds for the support of researchers based in 
the UK. 

• Running costs. Although it is likely that mature versions of SPM will be 
used for computationally-demanding tasks, we believe that much research 
with SPF can be done with small examples that make small computational 
demands on the host computer, and that the associated costs would be rel¬ 
atively small. It seems likely that the cost of administering a system of 
charges for running costs would outweigh the actual costs. In that case, it 
would be best, probably, not to attempt to make any charges for the use of 
the system. A possible exception is where a user wishes to run the system 
with computational tasks that are likely to consume large computational 
resources. 


3.6 Open source 

As already mentioned, SPM should be open source. More specifically, it should 
be possible for anyone—including people who are not registered users of SPF—to 
view the source code and documentation of any completed version of SPM, and 
for any registered user to run any completed version. 

In keeping with these points, all versions of SPM should be “free 

GNU General Public License, described on 
In brief, this means that every user should have: 

• The freedom to use the software for any purpose; 

• The freedom to change the software to suit his or her needs; 

• The freedom to share the software with friends and neighbours; and 

• The freedom to share the changes that any user makes. 


software conforming to the 
www. gnu. org/licenses/gpl. ht ml. 


This policy conforms to and supports the long-established principle that science 
works best when ideas, observations and experimental results are freely available to 
everyone, without restrictions. This kind of openness and transparency is essential 
if we are to make the difficult transition to cognitive computing (Section 2.3.1). 
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4 Costs 


Costs associated with SPF are estimated in the following subsections. A spread¬ 
sheet that summarises all the estimated costs, with totals, is shown in Section 

m 

4.1 Development costs 

We estimate that three members of staff with the right skills who are already 
familiar with the host machine would be able to develop SPF within 2 years. 
We suggest that it would be appropriate to employ one relatively senior person, 
perhaps at a salary of about £60, 000, one more junior person, perhaps at a salary 
of about £40, 000, and a third person, possibly a student, at a salary of about 
£ 20 , 000 . 

We have assumed that the cost of employing someone is twice the cost of his 
or her salary, and of course salaries would be paid for two years. 

We have allowed £15,000 for PCs, printers, software, and related facilities for 
the three staff. 

In view of the importance of the website for SPF and the specialised skills 
required for good website design—including the skills needed to ensure that the 
website is “mobile friendly”—we believe it would be prudent to employ a web 
design company to establish the framework for the website and advise on how it 
should be developed. We estimate that £2000 should be allowed for this work. 
We have allowed 10% for contingencies. 

4.2 Recurrent costs 

The annual cost of running SPF are estimated in the following subsections and 
summarised in Section l4~3l 

4.2.1 Maintenance costs 

We believe that when SPF is up and running, it will need support: to make sure 
that it runs smoothly and to deal with any snags that arise; to answer queries by 
users; to fix bugs in the system; and to refine the system in the light of feedback 
from users. 

We believe that these things can be done by one person with a salary of about 
£40, 000 pa, with an assistant at about £20, 000 pa. 
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4.2.2 Running costs 


As mentioned in Section 3.5, we believe that much of the research with SPF would 


make small demands on the host computer and, probably, that the associated costs 
would be smaller than the cost of administering a system of charges. Nevertheless, 
such costs do need to be covered. We estimate that they would be £10, 000 pa. 


4.2.3 Awareness-raising costs 

In accordance with “impact” policies of the UK Government and UK Research 
Councils (Section |2.9[ ), it is important that SPF and what it can do should be well 
known around the world, amongst academic and industrial researchers, amongst 
research administrators and policy-makers, and amongst members of the general 
public. This may be achieved via the following main routes, each one with associ¬ 
ated costs: 

• Via papers in academic journals and conference proceedings. Most of these 
would be prepared by research groups and individual researchers that are 
using SPF, and they would pay the associated costs. But, in addition, there 
would be academic papers about SPF and SPM, prepared by the two au¬ 
thors of this proposal and staff recruited to develop SPF. The main costs 
here would be the costs of attending conferences and charges for open-access 
publication. 

• The proposers, and perhaps development staff as well, may travel to give 
talks about SPF and the SP concepts. 

• Perhaps via conferences (online or traditional) dedicated to research using 
SPF. The organisation may be undertaken by a specialist conference organ¬ 
iser, and there would be associated costs. 

• Via articles in popular-science magazines, both those for the general reader 
and those with a more technical or specialist orientation. Here, it would be 
useful if articles could be prepared by writers with relevant skills (who would 
need to be paid), although technical input would be provided by the authors 
of this proposal and research/development staff. Articles may be translated 
from English into other languages. 

• Via the mainstream media (newspapers, radio and TV), internet blogs, and 
online videos. Again, it would be useful if articles, videos and the like could 
be prepared by people with relevant skills, with guidance from people with 
relevant technical knowledge. 
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We estimate that a budget of £50, 000 pa would be needed to cover these costs 
during the two-year development phase, reducing to £20, 000 pa for a further three 
years. Although we believe it is important that SPF should be a permanent facility, 
as described in Section |3.3[ we shall assume for present purposes that awareness¬ 
raising costs will be spread over a period of 20 years. In that case, the annual cost 
will be £((50, 000 x 2) + (20, 000 x 3))/20 = £8, 000. 


4.2.4 Contingencies 

As with the non-recurrent development costs, we have allowed 10% for contingen¬ 
cies. 


4.3 Summary of costs 

In this section, Table [l] summarises our estimates of non-recurrent costs for SPF 
(£546, 700 in total) and recurrent costs (£151, 800 pa in total). 


5 Conclusion 

The wide scope of the SP theory and the wide range of its potential benefits and 
applications means that there are many more avenues to be explored than could 
be tackled by any one research group. By providing the means for researchers 
worldwide to participate and collaborate, the research facility that is proposed in 
this document would facilitate the further development of the SP system and the 
realisation of its potential. 

In view of that potential, and the relatively small cost of the proposed facility, 
the creation of that facility would be a good investment. 


Appendices 

A Outline of the SP theory and SP computer 
model 

The SP theory is conceived as an abstract brain-like system that, in an ‘input’ 
perspective, may receive New information via its senses, and compress some or 
all of it to create Old information, as illustrated schematically in Figure [2j In 
the theory, information compression is the mechanism both for the learning and 
organisation of knowledge and for pattern recognition, reasoning, problem solving, 
and more. 
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Category 


£ £ 


Non-recurrent (development) 

Development salaries 
Senior 
Middle 
Junior 

Total (salaries, one year) 

Total (salaries + overheads, one year) 
Total (salaries + overheads, two years) 
PCs etc 
Website design 
Sub-total (non-recurrent) 
Contingencies (10%) 

Total (non-recurrent) 


70,000 

45,000 

25,000 

140,000 

280,000 

560,000 

15,000 

2,000 

577,000 

57,700 

634,700 


Recurrent (per year) 

Maintenance salaries 
Middle 
Junior 

Total (salaries) 

Total (salaries + overheads) 
Running costs 
Awareness raising 
Sub-total (recurrent) 
Contingencies (10%) 

Total (recurrent) 


45,000 

25,000 

70,000 

140,000 

10,000 

8,000 

158,000 

15,800 

173,800 


Table 1: A spreadsheet summarising our estimates of non-recurrent and recurrent 
costs for SPF. 


26 




Figure 2: Schematic representation of the SP system from an ‘input’ perspective. 


The subsections that follow outline the main elements of the SP theory and 
the SP machine. 


A.l The SP computer model and the SP machine 


The SP theory is realised in the form of a computer model, SP71, which may be 
regarded as a version of the SP machine. 

As noted in the Introduction, the use of a computer model as a vehicle for the 
theory has helped to reduce vagueness in the theory, it has provided a means of 
testing candidate ideas, many of which have been rejected, and it is a means of 
demonstrating what can be done with the system. 

An outline of how the SP computer model works may be found in P3S Section 
3.9], with more detail, including pseudocode, in pH?] Sections 3.10 and 9.2], 28 Fully 


commented source code for the SP71 computer model may be downloaded via a 
link near the bottom of www.cognitionresearch.org/sp.htm, and via “Ancillary 
hies” under www.arxiv.org/abs/1306.3888. 

As previously noted, we envisage that the SP computer model will be the basis 
for the creation of the high-parallel, open-source version of the SP machine. How 
things may develop is shown schematically in Figure [l] 


28 Tliese sources describe SP70, a slightly earlier version of the model than SP71 but quite 
similar to it. The description of SP70 includes a description, in ITT Sections 3.9.1 and 3.10], of 
a subset of the SP70 model called SP61. 
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A.2 Patterns and symbols 


In the SP system, knowledge is represented with arrays of atomic symbols in one 
or two dimensions called patterns. The SP71 model works with ID patterns but 
it is envisaged that the system will be generalised to work with 2D patterns Ell 
Section 3.3]. 

An ‘atomic symbol’ in the SP system is simply a mark that can be matched 
with any other symbol to determine whether it is the same or different: no other 
result is permitted. 

In themselves, SP patterns are not particularly expressive. But within the 


multiple alignment framework (Appendix A.3.2), they support the representation 


and processing of a wide variety of kinds of knowledge (Appendix |A.4[ ). A goal of 
the SP research programme is to establish one system for the representation and 
processing of all kinds of knowledge (see also m Section III]). Evidence to date 
suggests that this may be achieved with SP patterns in the multiple alignment 
framework. 

Any collection of SP patterns is termed a grammar. Although that term is 
most closely associated with linguistics, it is used in the SP research programme 
for a collection of SP patterns describing any kind of knowledge. 


A.3 Information compression 

In the SP theory, the emphasis on information compression derives from earlier re¬ 
search on grammatical inference [16j and the principle of minimum length encoding 

(mle) pa nanni). 

At an abstract level, information compression means the detection and reduc¬ 
tion of redundancy in information. In more concrete terms, redundancy means 
recurrent patterns , regularities , structures , and associations , including causal as¬ 
sociations. Thus information compression provides a means of discovering such 
things as words in natural language [16], objects [23], Section V-E], and associa¬ 
tions (see, for example, [24] Section III-A.l]), in accordance with the DONSVIC 
principle [21] Section 5.2] j^] 

The default assumption in the SP theory is that compression of information is 
always lossless, meaning that all non-redundant information is retained. In partic¬ 
ular applications, there may be a case for discarding non-redundant information 
(see, for example, [23] Section X-B]) but any such discard is reversible. 

In the SP system, information compression is achieved via the matching and 
unification of patterns. More specifically, it is achieved via the building of multiple 
alignments and via the unsupervised learning of grammars. These three things are 
described briefly in the following three subsections. 


29 DONSVIC = “The discovery of natural structures via information compression”. 






A.3.1 Information compression via the matching and unification of 
patterns 

The basis for information compression in the SP system is a process of searching 
for patterns that match each other with a process of merging or ‘unifying’ patterns 
that are the same. At the heart of the SP71 model is a method for finding good full 
and partial matches between sequences with advantages compared with classical 
methods [El Appendix A] j^] 

A.3.2 Information compression via the building of multiple alignments 

That process for finding good full and partial matches between patterns is the 
foundation for processes that build multiple alignments like the one shown in Fig¬ 
ure [3] This concept is similar to multiple alignment in bioinformatics but with 
important differences m Section 3.4], It is a powerful and distinctive feature of 
the SP system. 


30 The main advantages are [SI Section 3.10.3.1]: 1) That it can match arbitrarily long 
sequences without excessive demands on memory; 2) For any two sequences, it can find a set of 
alternative matches (each with a measure of how good it is) instead of a single ‘best’ match; 3) 
The ‘depth’ or thoroughness of the searching can be controlled by parameters. 
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ovary 

style 

stigma 

</pistil> 

</hermaphrodite> 

</flowers> 


<food_value> 
poisonous 
</food_value> 


</order> 

</family> - </family> 

- </genus> 


5 


6 


Figure 3: The best multiple alignment created by the SP computer model, with a set of New patterns 
(in column 0) that describe some features of an unknown plant, and a set of Old patterns, including 
those shown in columns 1 to 6, that describe different categories of plant, with their parts and sub¬ 
parts, and other attributes. 
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This example shows the best multiple alignment created by the SP computer 
model when a set of New patterns (in column Oj^is processed in conjunction with 
a set of pre-existing Old patterns like those shown in columns 1 to 6. Here, the 
multiple alignment is ‘best’ because it is the one that achieves the most economical 
description of the New patterns in terms of the Old patterns. The way in which 
that description or ‘encoding’ is derived from a multiple alignment is explained in 
IIS, Section 3.5] and [2T], Section 4.1]. Like all other kinds of knowledge, encodings 
derived from multiple alignments are recorded using SP patterns (Appendix |A.2[ ). 

This multiple alignment may be interpreted as the result of a process of recog¬ 
nition (Appendix |C.5[ ). The New patterns represent the features of some unknown 
plant and the Old patterns in columns 1 to 6 show how the plant has been identi¬ 
fied at several levels of abstraction: species ‘Meadow Buttercup’ (column 1), genus 
Ranunculus (column 6), family Ranunculaceae (column 5), and so on. 

A.3.3 Information compression via the unsupervised learning of gram¬ 
mars 

As outlined in Section 3.9.2] and [21], Section 5.1], and described more fully 
in m Chapter 9], the SP system may, without assistance from a “teacher” or 
anything equivalent, derive one or more plausible grammars from a body of New 
patterns, with minimum length encoding as a guiding principle. In that process, 
multiple alignment has a central role as a source of SP patterns for possible inclu¬ 
sion in any grammar [23] Section V-Bl], 

A.3.4 Heuristic search 

Like most problems in artificial intelligence, each of the afore-mentioned 
problems—finding good full and partial matches between patterns, finding or con¬ 
structing good multiple alignments, and inferring one or more good grammars from 
a body of data—is normally too complex to be solved by exhaustive search. 

With intractable problems like these, it is often assumed that the goal is to find 
theoretically ideal solutions. But with these and most other AI problems, “The 
best is the enemy of the good”. By scaling back one’s ambitions and searching for 
“reasonably good” solutions, it is often possible to find solutions that are useful, 
and without undue computational demands. 

As with other AI applications, and as with the building of multiple alignments 
in bioinformatics, the SP71 model uses heuristic techniques in all three cases men- 

31 Specifically, the New patterns in this example are ‘has_chlorophylT (a pattern with one 
symbol), ‘<stem> hairy </stem>’, ‘<petals> yellow </petals>’, ‘<stamens> numerous 
</stamens>’, and ‘<habitat> meadows </habitat>’. The patterns in a set like that may be 
presented to the system and processed in any order. 
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tioned above. This means searching for solutions in stages, with a pruning of the 
search tree at every stage, guided by measures of success m Appendix A; Sections 
3.9 and 3.10; Chapter 9]. With these kinds of techniques, acceptably good approx¬ 
imate solutions can normally be found without excessive computational demands 
and with “big O” values that are within acceptable limits. 

A.4 Multiple alignment and the representation and pro¬ 
cessing of diverse kinds of knowledge 

The expressive power of SP patterns within the multiple alignment framework de¬ 
rives in large part from the way that symbols in one pattern may serve as links 
to one or more other patterns or parts thereof. One of several examples in Fig¬ 
ure [3] is how the pair of symbols ‘<faraily> . . . </family>’ in column 6 serves 
to identify the pattern ‘<family> . . . Ranunculales . . . <hermaphrodite> 
. . . poisonous . . . </family>’ in column 5. 

In the figure, these kinds of linkages between patterns mean that the unknown 
plant (with characteristics shown in column 0) may be recognised at several dif¬ 
ferent levels within a hierarchy of classes: genus, family, order, class, and so on. 
Although it is not shown in this example, the system also supports cross classifi¬ 
cation. 

In the figure, the parts and sub-parts of the plant are shown in such structures 
as ‘<shoot>’ (column 3), ‘<flowers>’ (column 5), ‘<petals>’ (column 6), and so 
on. 

As in conventional systems for object-oriented design, the system provides for 
inheritance of attributes (Appendix |C.8[ ). But unlike such systems, there is smooth 
integration of class hierarchies and part-whole hierarchies, without awkward in¬ 
consistencies m Section 4.2.1], 

More generally, SP patterns within the multiple alignment framework provide 
for the representation and processing of a wide variety of kinds of knowledge includ¬ 
ing: the syntax and semantics of natural language; class hierarchies and part-whole 
hierarchies (as just described); networks and trees; entity-relationship structures; 
relational knowledge; rules and several kinds of reasoning; patterns and pattern 
recognition; images; structures in three dimensions; and procedural knowledge. 
There is a summary in j24j Section III-B], and more detail in Appendix |Cj 
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A.5 Information compression, prediction, and probabili¬ 
ties 

Owing to the close connection between information compression and concepts of 
prediction and probability [8], the SP system is fundamentally probabilistic]^ 
Each SP pattern has an associated frequency of occurrence and probabilities may 
be calculated for each multiple alignment and for any inference that may be drawn 
from any given multiple alignment. 

Although the SP system is fundamentally probabilistic: it can be constrained 
to answer only those kinds of questions where probabilities are close to 0 or 1; and, 
via the use of error-reducing redundancy, it can deliver decisions with high levels of 
confidence. Contrary to what one may suppose, there is no conflict between the use 
of error-reducing redundancy and the notion that “computing” may be understood 
as information compression—the two things are independent, as described in m 
Section 2.3.7]. 

A.6 SP-neural 

Part of the SP theory is the idea, described most fully in [T9], Chapter 11], that 
the abstract concepts of symbol and pattern in the SP theory may be realised more 
concretely in the brain with collections of neurons in the cerebral cortex. 

The neural equivalent of an SP pattern is called a pattern assembly. The word 
“assembly” has been adopted in this term because the concept is quite similar 
to Donald Hebb’s |5] concept of a cell assembly. The main difference is that 
the concept of pattern assembly is unambiguously explicit in proposing that the 
sharing of structure between two or more pattern assemblies is achieved by means 
of ‘references’ from one structure to another, as described and discussed in m 
Section 11.4.1]). 

It is pertinent to mention that unsupervised learning in the SP theory ([T9] 
Chapter 9], [21, Section 5]) is quite different from “Hebbian learning” as described 
by Hebb [5] and widely adopted in the kinds of artificial neural networks that are 
popular in computer science]^] By contrast with Hebbian learning, the SP system, 
like a person, may learn from a single exposure to some situation or event. And, 
by contrast with Hebbian learning, it takes time to learn a language in the SP 

32 There are reasons to believe that the same is true of “computing” at its most fundamental 
level, and also mathematics. For example, Gregory Chaitin has written “I have recently been 
able to take a further step along the path laid out by Godel and Turing. By translating a 
particular computer program into an algebraic equation of a type that was familiar even to the 
ancient Greeks, I have shown that there is randomness in the branch of pure mathematics 
known as number theory. My work indicates that—to borrow Einsteins metaphor—God 
sometimes plays dice with whole numbers.” @1 p. 80]. 

33 See, for example, “Hebbian theory”, Wikipedia , bit.ly/lsW6ATt, retrieved 2014-12-19. 
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system because of the complexity of the search space, not because of any kind of 
gradual strengthening or “weighting” of links between neurons [TD], Section 11.4.4]. 

Figure [4] shows schematically how pattern assemblies may be represented and 
inter-connected with neurons. Here, each pattern assembly, such as ‘< NP < D > 
< N > >’, is represented by the sequence of atomic symbols of the corresponding 
SP pattern. Each atomic symbol, such as ‘<’ or ‘NP’, would be represented in the 
pattern assembly by one neuron or a small group of inter-connected neurons^ 


Apart from the inter-connections amongst pattern assemblies, the cortex in SP- 
neural is somewhat like a sheet of paper on which knowledge may be written in 
the form of neurons. 

ft is envisaged that any pattern assembly may be ‘recognised’ if it receives 
more excitatory inputs than rival pattern assemblies, perhaps via a winner-takes- 
all mechanism m Section 11.3.4], And, once recognised, any pattern assembly 
may itself be a source of excitatory signals leading to the recognition of higher-level 
pattern assemblies. 


B Distinctive features and apparent advantages 
of the SP system 

Information compression and concepts of probability are themes in other research, 
including research on Bayesian inference, Kolmogorov complexity, deep learning, 
artificial neural networks, minimum length encoding, unified theories of cognition, 
natural language processing and more. The main features that distinguish the SP 
system from these other areas of research, and apparent advantages compared with 
these other approaches, are described quite fully in “The SP theory of intelligence: 
its distinctive features and advantages” [27]. The main points are summarised 
here: 


• Simplification and integration of observations and concepts. As mentioned 
above, the SP theory is a unique attempt to simplify and integrate obser¬ 
vations and concepts across artificial intelligence, mainstream computing, 
mathematics, and human perception and cognition. The canvass is much 
broader than it is, for example, in “unified theories of cognition”. It has 
quite a lot to say, for example, about the nature of mathematics |T7|, flTJ , 
Chapter 10], [25]. 

• Simplification and integration of computing systems. The provision of one 
simple format for knowledge, and one framework—multiple alignment—for 

34 Not shown in the figure are lateral connections within each pattern assembly and 
inhibitory connections elsewhere, as outlined in [M Sections 11.3.3 and 11.3.4]. 
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To higher-level pattern assemblies 



From lower-level pattern assemblies 
and/or receptor arrays 


Figure 4: Schematic representation of inter-connections amongst pattern assem¬ 
blies as described in the text. Not shown in the figure are lateral connections 
within each pattern assembly, and inhibitory connections. 
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the representation and processing of knowledge, promote an overall simpli¬ 
fication of computing systems, including both hardware and software [2E, 
Section 5] . They also promote seamless integration of diverse structures and 
functions m Section 7]. 

• Descriptive and explanatory range. It appears that the descriptive and ex¬ 
planatory range of the SP system is much wider than any of the alternatives 
mentioned above. In terms of demonstrable capabilities as well as abstract 
concepts, it has strengths in areas that include the representation of diverse 
forms of knowledge (including class hierarchies, part-whole hierarchies, and 
their seamless integration), unsupervised learning, natural language process¬ 
ing, fuzzy pattern recognition and recognition at multiple levels of abstrac¬ 
tion, best-match and semantic forms of information retrieval, several kinds of 
reasoning (one-step ‘deductive reasoning’, abductive reasoning, probabilistic 
networks and trees, reasoning with ‘rules’, nonmonotonic reasoning, explain¬ 
ing away, causal reasoning, and reasoning that is not supported by evidence), 
planning, problem solving, information compression, and aspects of neuro¬ 
science and of human perception and cognition. 

• Many potential benefits and applications. In the spirit of the quote at the 
start of the Introduction (“There is nothing more practical than a good 
theory”), the SP theory, with its broad base of support, has many potential 
benefits and applications, as outlined in Appendix [Cj 

• Strong theoretical underpinnings. Compared with, for example, artificial neu¬ 
ral networks and deep learning, the SP system appears to have much stronger 
and more coherent theoretical underpinnings. The multiple alignment frame¬ 
work provides a simple but powerful realisation of the concept on which the 
SP theory is founded: information compression via the matching and unifi¬ 
cation of patterns. 

• The SP theory is a theory of computing. Most other research is founded on 
the idea that computing may be understood in terms of the Universal Turing 
Machine or equivalent models such as Larnda Calculus or Post’s Canonical 
System. By contrast, the SP theory is itself a theory of computing [19j Chap¬ 
ter 4], What is distinctive about the SP theory as a theory of computing is 
that it provides much of the human-like intelligence that is missing from 
earlier modelsW\ 

35 Although Alan Turing saw that computers might become intelligent [2], the Universal 
Turing Machine, in itself, does not tell us how! The SP theory goes some way towards plugging 
the gap, with potential to do more. 
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• Information compression via the matching and unification of patterns. In 
trying to cut through complexities, the SP research programme focuses on a 
simple, ‘primitive’ idea: that information compression may be understood as 
a search for patterns that match each other, with the merging or ‘unification’ 
of patterns that are the same. The potential advantage of this approach, 
focussing on the simple concept of matching and unifying patterns, is that 
it can help us avoid old tramlines, and open doors to new ways of thinking. 

• Multiple alignment. More specifically, information compression via the 
matching and unification of patterns provides the basis for a concept of mul¬ 
tiple alignment , borrowed and adapted from that concept in bioinformatics. 
Developing this idea as a framework for the simplification and integration 
of concepts across a broad canvass has been a major undertaking. Multiple 
alignment is a distinctive and powerful idea in the SP research programme. 

• Transparency in the representation and processing of knowledge. By con¬ 
trast with sub-symbolic approaches to artificial intelligence (artificial neural 
networks, deep learning, and related approaches), and notwithstanding ob¬ 
jections to symbolic Aig knowledge in the SP system is transparent and 
open to inspection, and likewise for the processing of knowledge. 

• The DONSVIC principle. A related point is that unsupervised learning in 
the SP system is geared to the realisation of the “DONSVIC” principle— 
The Discovery of Natural Structures Via Information Compression PU Sec¬ 
tion 5.2], By contrast with sub-symbolic approaches to artificial intelligence, 
structures created by learning should, normally, be comprehensible by peo¬ 
ple. 


• Perception and cognition. The SP theory draws extensively on research on 
human and animal perception and cognition, and neuroscience. In particular, 
an important part of its inspiration is research on the learning of natural 
language (see www.cognitionresearch.org/lang_learn.html). 


SP-neural. The SP theory includes proposals—SP-neural—for how abstract 
concepts in the theory may be realised in terms of neurons and neural pro¬ 
cesses. The SP-neural proposals (Appendix A.6) are significantly different 
from artificial neural networks as commonly conceived in computer science, 
and arguably more plausible in terms of neuroscience. 


36 See, for example, “Hubert Dreyfus’s views on artificial intelligence”, Wikipedia , 
bit.ly/lhGHVm8j retrieved 2014-08-19. 
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C Some potential benefits and applications of 
the SP system 


With the SP system, there is a large range of potential benefits and applications, 
some of which are outlined in the following subsections. 

Some potential applications may be developed on relatively short timescales 
using existing high-performance computers or even ordinary computers. These in¬ 
clude the SP system as an intelligent database (Appendix |C.6[ ), and applications in 
such areas as medical diagnosis (Appendix C.13.3[), pattern recognition (Appendix 


C.5), information compression (Appendix C. 13.5), highly-economical transmission 


of information (Appendix C.13.1, [2H Section VIII]), bioinformatics (Appendix 


C.13.4), and natural language processing (Appendix C.4.3). 


More radical solutions, that may take longer to develop, include a radically 
new architecture for computers (Appendix C.2), and developing the full potential 
of the system in computer vision (Appendix C.7), and natural language processing 
(Appendix |C.4[ ). 


C.l Simplification and integration of concepts 

Although some people may argue otherwise, the world of computing suffers from a 
deep malaise: its fragmentation into a myriad of concepts, shown schematically in 
Figure [5j a myriad of different formalisms and formats for representing knowledge, 
each with its own mode of processing m Section III], and the extraordinary 
complexity of many computing systems, especially software. 

The quest for simplification and integration in the SP theory accords with 
Occam’s Razor, one of the most widely-accepted principles in science. In terms 
of that principle, seeking to combine conceptual simplicity with descriptive and 
explanatory power. 

• A relatively simple conceptual framework—multiple alignment—provides an 
account of a wide range of concepts and phenomena mm- 

• There is potential for very substantial simplification of computing systems, 
taking hardware and software together [26] Section 5], and for seamless in¬ 
tegration of diverse structures and functions |26, Sections 2 and 7]. 

• Like any theory that simplifies and integrates a good range of observations 
and concepts, the SP theory promises deeper insights and better solutions 
to problems than may otherwise be achieved [2B1 Section 6], [SB, Section 
IV-A3], 
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Figure 5: Schematic representation of the fragmentation of computer science. 

C.2 A radically new architecture for computers 

The SP system has the potential to provide the basis for a radically new architec¬ 
ture for computers, either via the abstract model (with a central role for multiple 
alignment) or, more likely, via SP-neural. Potential benefits of this new architec¬ 
ture would be human-like versatility and adaptability in computing, substantial 
simplification of software, and dramatic reductions in the energy demands of com¬ 
puters, and in their size and weight [Ml Section IX]. 


C.3 Unsupervised learning 


The SP theory originates in part from an earlier programme of research on gram¬ 
matical inference and the unsupervised learning of natural language, with min¬ 
imum length encoding as a central principle [16], m Section V-A3]. However, 
meeting the goals of the SP research programme has meant a radical reorganisa¬ 
tion of computer models, with the development of multiple alignment as a frame¬ 
work for the simplification and integration of diverse structures and functions [23] 
Section V-A4], The new model demonstrates capabilities in grammatical inference 
m Chapter 9], Appendix |A.3.3 ) and appears to have considerable potential for 
further development with linguistic, visual, and other kinds of knowledge. 
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C.4 Natural language processing 

In addition to the learning of linguistic knowledge (Appendix |C.3[ ), the SP sys¬ 
tem has strengths in the parsing of natural language, the production of natural 
language, and the integration of syntactic and semantic knowledge, as outlined in 
this section. These aspects of the system are described more fully in [STJ Section 
8] and in [I9J Chapter 5]. 


C.4.1 Parsing of natural language 

Figure [6] shows how, via multiple alignment, a sentence (in row 0) may be parsed 
in terms of grammatical structures including words (rows 1 to 8)|^] It also shows, 
in row 8, how the system may mark the syntactic dependency between the plural 
subject of the sentence (‘Np’) and the plural main verb (‘Vp’) (see also m Sections 
5.4 and 5.5], |2T] Section 8.1]). 
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Figure 6: The best multiple alignment created by the SP model with a store of Old 
patterns like those in rows 1 to 8 (representing grammatical structures, including 
words) and a New pattern (representing a sentence to be parsed) shown in row 0. 


To create a multiple alignment like the one in the figure, the system needs a 
grammar of Old patterns, like those shown, one per row, in rows 1 to 8 of the figure. 
In this example, the patterns represent linguistic structures including words. 

Although SP patterns are remarkably simple, it appears that, within the mul¬ 
tiple alignment framework, they have at least the expressive power of a context- 
sensitive grammar [19] Sections 5.4 and 5.5]. As previously noted (Appendix |A.2[ ), 
there is reason to believe that, within the multiple alignment framework, all kinds 
of knowledge may be represented by SP patterns. 

37 Compared with the multiple alignment shown in Figure [ 3 ] this multiple alignment is 
rotated through 90°, replacing columns with rows. The choice between these two styles, which 
are equivalent, depends largely on what fits best on the page. 
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C.4.2 Production of natural language 


A neat feature of the SP system is that one set of mechanisms may achieve both 
the analysis or parsing of natural language (Appendix C.4.1) and the generation 
or production of sentences. This is explained in [191 Section 3.8] and [21], Section 
4.5]. 


C.4.3 The integration of syntax and semantics 

The use of one simple format for all kinds of knowledge is likely to facilitate the 
seamless integration of syntax and semantics. Preliminary examples of how this 
may be done are shown in m Section 5.7], both for the derivation of meanings 
from surface forms [191 Figure 5.18] and for the production of surface forms from 
meanings ESI Figure 5.19]. 

It appears that, on relatively short timescales, there is potential via the SP 
system to create natural language interfaces for such things as internet TVs, DVD 
recorders and the like, and thus help to overcome much of the difficulty that many 
people now experience in controlling these things. 


C.4.4 Parallel streams of information 

Up to now, most work on natural language within the SP research programme 
has made the simplifying assumption that language may be represented with a se¬ 
quence of symbols, as in ordinary text. But with some aspects of natural language 
such as formants in speech, and the relationship between syntax and semantics, 
there seem to be parallel streams of information. The way in which such paral¬ 
lelism may be represented and processed with 2D patterns in the SP system is 
described in [23] Section IV-B4 and Appendix C]. 


C.5 Pattern recognition 

As described quite fully in [fU] Chapter 6] and more briefly in [2TJ Section 9], the 
SP system has strengths in several aspects of pattern recognition: 

• It can recognise patterns at multiple levels of abstraction, with the integra¬ 
tion of class-inclusion relations and part-whole relations, as shown in the 
example in Figure [3] 

• It can model “family resemblance” or polythetic categories, meaning that 
recognition does not depend on the presence absence of any particular feature 
or combination of features. 
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Recognition is robust in the face of errors of omission, commission or substi¬ 
tution in the New pattern or patterns. 


• For any given identification, or any related inference, the SP system may 
calculate associated probabilities. 

• As a by-product of how recognition is achieved via the building of multiple 
alignments, the system provides a model for the way in which context may 
influence recognition. 

C.6 Information storage and retrieval, with intelligence 

The SP theory provides a versatile model for database systems, with the ability to 
accommodate object-oriented structures, as well as relational ‘tuples’, and network 
and tree models of data pn). It lends itself most directly to information retrieval 
in the manner of query-by-example but it appears to have potential to support the 
use of natural language or query languages such as SQL. 

Unlike some ordinary database systems: 

• The storage and retrieval of information is integrated with other aspects of 
intelligence such as pattern recognition, reasoning, planning, problem solv¬ 
ing, and learning—as outlined elsewhere in this document. 

• The SP system provides a simple but effective means of combining class hier¬ 
archies with part-whole hierarchies, with inheritance of attributes (Appendix 
A~4| . 

• It provides for cross-classification with multiple inheritance. 

• There is flexibility and versatility in the representation of knowledge arising 
from the fact that the system does not distinguish ‘parts’ and ‘attributes’ 
[20l Section 4.2.1], 

• Likewise, the absence of a distinction between ‘class’ and ‘object’ facilitates 
the representation of knowledge and eliminates the need for a ‘metaclass’ 
[20] , Section 4.2.2], 

• SP patterns provide a simpler and more direct means of representing entity- 
relationship models than do relational tuples [20;, Section 4.2.3], 
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C.7 Vision 


With generalisation of the SP system to accommodate 2D patterns, it has poten¬ 
tial to model several aspects of natural vision and to facilitate the development 
of human-like abilities in artificial vision [22] . In these connections, the main 
strengths and potential of the SP system are: 

• Low level perceptual features such as edges or corners may be identified via 
the multiple alignment framework by the extraction of redundancy in uniform 
areas in the manner of the run-length encoding technique for information 
compression. 

• The system may be applied in the recognition of objects and in scene analysis, 
with the same strengths as in pattern recognition (Appendix |C.5[ ). 

• There is potential for the learning of visual entities and classes of entity and 
the piecing together of coherent concepts from fragments 1221 Section 5]. 

• There is potential, via multiple alignment, for the creation of 3D models of 
objects and of surroundings [22:, Section 6]. 

• The SP theory provides an account of how we may see things that are not 
objectively present in an image, how we may recognise something despite 
variations in the size of its retinal image, and how raster graphics and vector 
graphics may be unified. 

• And the SP theory has things to say about the phenomena of lightness 
constancy and colour constancy, ambiguities in visual perception, and the 
integration of vision with other senses and other aspects of intelligence. 

C.8 Reasoning 

As described in quite fully in [T9], Chapters 7 and 10, Section 6.4] and more selec¬ 
tively in [21 : , Section 10], the SP system lends itself to several kinds of reasoning: 

• One-step ‘deductive’ reasoning. 

• Abductive reasoning. 

• Reasoning with probabilistic decision networks and decision trees. 

• Reasoning with ‘rules’. 

• Nonmonotonic reasoning and reasoning with default values. 
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• Reasoning in Bayesian networks, including “explaining away”. 

• Causal diagnosis. 

• Reasoning which is not supported by evidence. 

• Inheritance of attributes in an object-oriented class hierarchy or heterarchy. 

There is also potential for spatial reasoning [23, Section IV-F1] and what-if 
reasoning [23], Section IV-F2], 

These several kinds of reasoning may work together seamlessly without awk¬ 
ward incompatibilities, and likewise for how they may integrate seamlessly with 
such AI functions as unsupervised learning, pattern recognition, and so on [211, 
Sections 2, 4, and 7]. 

For any given inference reached via any of these kinds of reasoning, the SP 
system may calculate associated probabilities (Appendix |A.5[ ). 

Although the system is fundamentally probabilistic, it may imitate the effect 
of logic and other ‘exact’ forms of reasoning plTj Section 10.4.5]. 

C.9 Planning and problem solving 

With data about flights between different cities, represented using SP patterns, 
the SP computer model may find a route between any two cities (if such a route 
exists) and, if there are alternative routes, it may find them as well [IS] Section 
8 . 2 ], 

Provided they are translated into textual form, the SP computer model can 
solve geometric analogy problems of the kind found in puzzle books and some IQ 
tests Pj2 Section 8.3], [21, Section 12], 

C.10 Software engineering 

Although it may not seem obvious at first sight, the multiple alignment framework 
can model several devices used in ordinary procedural programming, including: 
procedure, function, or subroutine ; variable, value and type-, function with parame¬ 
ters ; conditional statement ; and the means of repeating operations such as repeat 
... until or do ... while j26[ Section 6.6.1]. In accordance with good practice in 
software engineering, the SP system facilitates the integration of ‘programs’ with 
‘data’. And as previously noted (Appendix |A.4[ ), the SP system supports object- 
oriented concepts such as class hierarchies with inheritance of attributes. 

In m Section 6.6.3], it is suggested that, since SP patterns at the ‘top’ level 
are independent of each other, they may serve to model processes that may run 
in parallel. Now it appears that a better option is to model parallel processes as 
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parallel streams of information, represented in 2D SP patterns as described in [23, 
Appendix C]. The advantage of this latter scheme is that it provides the means of 
showing when two or more events occur at the same time and, more generally, the 
relative timings of events. 

Within the SP system, these structures and mechanisms may serve in the 
representation and processing of sequential and parallel procedures from the real 
world such as those required for cooking a meal, organising a party, going shopping, 
and so on. 

Potential benefits in software engineering include the elimination of compiling 
or interpretation, automatic programming, benefits in verification and validation, 
and helping to overcome the problem of technical debt [20, Section 6.6]. 

C.ll Mathematics 

Aspects of mathematics may be understood in terms of some basic techniques for 
information compression: chunking-with-codes , schema-plus-correction, and run- 
length coding [23], and some features of mathematics may be modelled in the SP 
system [Bl Chapter 10]. 

On the strength of this evidence and some other considerations, there is reason 
to believe that all of mathematics may be understood in terms of information com¬ 
pression. There appear to be considerable implications for mathematics, and also 
for science—because of the importance of mathematics as a language of science. 


C.12 Human perception and cognition, and neuroscience 


Part of the inspiration for the SP theory has been earlier research on the role of 
information compression in the workings of brains and nervous systems ID 121 E], 
and a programme of research on the learning of a first language or languages (I6j . 
Thus there is reason to believe that the SP theory may help to illuminate human 
perception and cognition. 

Since the converse is true—that insights into the nature of human perception 
and cognition may help to inform the development of the SP theory—the two 
things are often considered together, as can be seen in and other writings 

about the SP system. 

As outlined in Appendix A.6 the SP theory includes the proposal—called SP- 
neural —that abstract concepts in the theory may be realised in terms of neurons 
and their interconnections [El Chapter 11], 
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C.13 Other potential benefits and applications 

The versatility of the SP system may be seen not only in the areas outlined above 
but in potential benefits and applications summarised in the following subsections. 

C.13.1 Big data 

The SP system may help to solve nine problems associated with big data [23]. In 
brief, these problems and their potential solutions are: 

• Overcoming the problem of variety in big data. Harmonising diverse kinds 
of knowledge, diverse formats for knowledge, and their diverse modes of 
processing, via a universal framework for the representation and processing 
of knowledge (UFK). 

• Learning and discovery. The unsupervised learning or discovery of ‘natural’ 
structures in data. 

• Interpretation of data. The system has strengths in areas such as pattern 
recognition, information retrieval, parsing and production of natural lan¬ 
guage, translation from one representation to another, several kinds of rea¬ 
soning, planning and problem solving. 

• Velocity: analysis of streaming data. The SP system lends itself to an in¬ 
cremental style, assimilating information as it is received, much as people 
do. 

• Volume: making big data smaller. Reducing the size of big data via lossless 
compression can yield several benefits. 

• Transmission of data. There is potential for substantial economies in the 
transmission of data by judicious separation of ‘encoding’ and ‘grammar’. 

• Energy, speed, and bulk. There is potential for big cuts in the use of energy 
in computing, for greater speed of processing with a given computational 
resource, and for corresponding reductions in the size and weight of comput¬ 
ers. 

• Veracity: managing errors and uncertainties in data. The SP system can 
identify possible errors or uncertainties in data, suggest possible corrections 
or interpolations, and calculate associated probabilities. 

• Visualisation. Knowledge structures created by the system, and inferential 
processes in the system, are all transparent and open to inspection. They 
lend themselves to display with static and moving images. 
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Considering these proposed solutions collectively, and in several cases individ¬ 
ually, it appears that there are no alternatives that can rival the potential of what 
is described in [23], 


C.13.2 Autonomous robots 


The SP system may help in the design of autonomous robots, meaning robots 
that do not depend on external intelligence or power supplies, are mobile, and are 
designed to exhibit as much human-like intelligence as possible [23]. The three 
main areas where it may make a contribution—problems with the efficiency and 
bulkiness of computers; developing human-like versatility in robots; and developing 
human-like adaptability in robots—are summarised in Section |2.4 


C.13.3 Medical diagnosis 

The way in which the SP system may be applied in medical diagnosis is described 
in [JH]. The expected benefits of the SP system in that area of application include: 

• A format for representing diseases that is simple and intuitive. 

• An ability to cope with errors and uncertainties in diagnostic information. 

• The simplicity of storing statistical information as frequencies of occurrence 
of diseases. 

• The system provides a method for evaluating alternative diagnostic hypothe¬ 
ses that yields true probabilities. 

• It is a framework that should facilitate the unsupervised learning of medical 
knowledge and the integration of medical diagnosis with other AI applica¬ 
tions. 

The main emphasis in [18j is on medical diagnosis as pattern recognition. But 
the SP system may also be applied to causal diagnosis [T9|, Section 7.9], f21] Section 
10.5] so that it may be possible, for example, to reason that “The patient’s fatigue 
may be caused by anemia which may be caused by a shortage of iron in the diet”. 


C.13.4 Bioinformatics 

Because of the central importance of multiple alignment in the SP system, and 
because of the importance of that concept in bioinformatics, there is clear potential 
for the SP system to find applications in that area [261 Section 6.10.2], Multiple 
alignment as it has been developed in the SP system has potential advantages 
compared with multiple alignment as it has been developed for bioinformatics. 
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Another potential advantage of the SP system is its capabilities and potential 
in unsupervised learning (Appendices A.3.3 and C.3, and sources referenced there): 


• They have potential to discover recurrent patterns of arbitrary size within 
DNA sequences, amino acid sequences, and the like—including sequences 
that are discontinuous, jumping over intervening structures. 

• They have potential to discover disjunctive classes of entity, with the context 
in which they are embedded. 

• With the right kind of data, there is potential to discover structures and 
associations within and between DNA, proteins, enzymes, and associated 
biochemical processes. 

• Likewise, there is potential to discover associations between diseases and 
their symptoms on the one hand, and biochemical structures and processes 
on the other. 


C.13.5 Information compression 


Since information compression is central in the workings of the SP system (Ap¬ 
pendix A.3), there is reason to believe that an industrial-strength version of the 
system will be useful in that area of application. 
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